minimum, maximum, and quartiles. Summary statistics for residuals are what you should expect to find
in the residuals section of your software’s output. Here’s what you see in Figure 16-4 at the top under
Residuals:
The minimum and maximum values: These are labeled as Min and Max, respectively, and
represent the two largest residuals, or the two points that lie farthest away from the least-squares
line in either direction. The minimum is negative, indicating it is below the line, while the positive
maximum is above the line. The minimum is almost 21 mmHg below the line, while the maximum
lies about 17 mmHg above the line.
The first and third quartiles: These are labeled IQ and 3Q on the output. Looking under IQ,
which is the first quartile, you can tell that about 25 percent of the data points (which would be 5
out of 20) lie more than 4.7 mmHg below the fitted line. For the third quartile results, you see that
another 25 percent lie more than 6.5 mmHg above the fitted line. The remaining 50 percent of the
points lie within those two quartiles.
The median: Labeled Median on the output, a median of –3.4 tells you that half of the residuals,
which is 10 of the 20 data points, are less than –3.4, and half are greater than –3.4. The negative
sign means the median lies below the fitted line.
Note: The mean isn’t included in these summary statistics because the mean of the residuals is always
exactly 0 for any kind of regression that includes an intercept term.
The residual standard error, often called the root-mean-square (RMS) error in regression
output, is a measure of how tightly or loosely the points scatter above or below the fitted line.
You can think of it as the standard deviation (SD) of the residuals, although it’s computed in a
slightly different way from the usual SD of a set of numbers. RMS uses
instead of
in
the denominator of the SD formula. At the bottom of Figure 16-4, Residual standard error is
expressed as 9.838 mmHg. You can think of it as another summary statistic for residuals
Graphs of the residuals
Most regression programs will produce different graphs of the residuals if requested in code. You can
use these graphs to assess whether the data meet the criteria for executing a least-squares straight-line
regression. Figure 16-6 shows two of the more common types of residual graphs. The one on the left is
called a residuals versus fitted graph, and the one on the right is called a normal Q-Q graph.